Skip to content

Refactor ConvertGenericCallExpr#145

Merged
nunoplopes merged 26 commits into
Cpp2Rust:masterfrom
lucic71:call
Jun 4, 2026
Merged

Refactor ConvertGenericCallExpr#145
nunoplopes merged 26 commits into
Cpp2Rust:masterfrom
lucic71:call

Conversation

@lucic71
Copy link
Copy Markdown
Contributor

@lucic71 lucic71 commented May 23, 2026

ConvertGenericCallExpr is a messy function and it became difficult to add new functoinality on top of it. So I decided to refactor it into: EmitCall(CollectCallInfo(expr)).

In the near future I plan to add 3 new features on top of it:

  1. Don't hoist arguments that are integer literals, they don't alias with any other argument
  2. Don't hoist if all passed arguments are local variables and all have non-reference/non-pointer types, local variables don't alias each other
  3. Translate mapped (variadic) functions that have a direct mapping in libc, for example fcntl -> libc::fcntl

In the new version of ConvertGenericCallExpr, CollectCallInfo is responsible for returning the call information:

  struct CallArg {
    enum class Kind {
      Hoisted,
      Inline,
      Materialized,
    };

    clang::Expr *expr;
    Kind kind;
    std::string param_name;
    clang::QualType param_type;
    bool has_default;
    std::string ref_temp_name;
  };

  struct CallInfo {
    clang::Expr *callee;
    bool is_variadic;
    bool is_fn_ptr_call;
    std::vector<CallArg> args;
    std::vector<clang::Expr *> variadic_args;
  };

And EmitCall is responsible for emitting the hoisted arguments, the callee and the argument list using a CallInfo.

This model makes the addition of new strategies, for example Don't hoist arguments that are integer literals, they don't alias with any other argument, easier because only the CallInfo fields have to be modified.

Comment thread cpp2rust/converter/converter.cpp Outdated
using Kind = CallArg::Kind;

unsigned arg_begin = 0; // skip count for operator()'s implicit object arg
CallInfo info{};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{} unneded

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped

Comment thread cpp2rust/converter/converter.cpp Outdated
}
}

void Converter::EmitCall(CallInfo info) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passed by value

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to CalInfo&& info

Comment thread cpp2rust/converter/converter.h Outdated
Materialized,
};

clang::Expr *expr;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually these are sorted by size (larger to shorter) to reduce struct size.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread cpp2rust/converter/converter.h Outdated
std::optional<TempMaterializationCtx> ConvertCallExpr(clang::CallExpr *expr);

struct CallArg {
enum class Kind {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: int8_t

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread cpp2rust/converter/converter.cpp Outdated
case Kind::Hoisted:
StrCat("let",
std::format("_{}: {}", ca.param_name, ToString(ca.param_type)),
"=");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'='

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with std::format("let {}: {} =")

Comment thread cpp2rust/converter/converter.h Outdated
bool is_variadic;
bool is_fn_ptr_call;
std::vector<CallArg> args;
std::vector<clang::Expr *> variadic_args;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's only up to one variadic arg. this can be a simple pointer initialized to null

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, for example in the following code:

void foo(int a, ...);

foo(1, 2, 3);

=>

args = 1,
variadic_args = 2, 3

And the Rust code is:

fn foo(a: i32, args: &[VaArg])

foo(1, &[2.into(), 3.into()])

std::string param_name;
std::string ref_temp_name;
clang::QualType param_type;
clang::Expr *expr;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch param_type and expr

Copy link
Copy Markdown
Contributor Author

@lucic71 lucic71 May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of both param_type and expr is 8, I inspected using -fdump-record-layouts. Is there other reason for switching?

Dump of -Xclang -fdump-record-layouts
*** Dumping AST Record Layout
         0 | struct cpp2rust::Converter::CallArg
         0 |   class std::basic_string<char> param_name
         0 |     struct std::basic_string<char>::_Alloc_hider _M_dataplus
         0 |       class std::allocator<char> (base) (empty)
         0 |         class std::__new_allocator<char> (base) (empty)
         0 |       pointer _M_p
         8 |     size_type _M_string_length
        16 |     union std::basic_string<char>::(anonymous at /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/basic_string.h:204:7) 
        16 |       char[16] _M_local_buf
        16 |       size_type _M_allocated_capacity
        32 |   class std::basic_string<char> ref_temp_name
        32 |     struct std::basic_string<char>::_Alloc_hider _M_dataplus
        32 |       class std::allocator<char> (base) (empty)
        32 |         class std::__new_allocator<char> (base) (empty)
        32 |       pointer _M_p
        40 |     size_type _M_string_length
        48 |     union std::basic_string<char>::(anonymous at /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/basic_string.h:204:7) 
        48 |       char[16] _M_local_buf
        48 |       size_type _M_allocated_capacity
        64 |   class clang::QualType param_type
        64 |     class llvm::PointerIntPair<class llvm::PointerUnion<const class clang::Type *, const class clang::ExtQuals *>, 3> Value
        64 |       struct llvm::detail::PunnedPointer<class llvm::PointerUnion<const class clang::Type *, const class clang::ExtQuals *> > Value
        64 |         unsigned char[8] Data
        72 |   clang::Expr * expr
        80 |   _Bool has_default
        81 |   Kind kind
           | [sizeof=88, dsize=82, align=8,
           |  nvsize=82, nvalign=8]


struct CallInfo {
clang::Expr *callee;
bool is_variadic;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_variadic isn't equal to !variadic_args.empty()?

Copy link
Copy Markdown
Contributor Author

@lucic71 lucic71 May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, a variadic function call be called with 0 variadic arguments, in which case we need to emit an empty &[VaArgs] like this: foo(0, &[])

}
}

void Converter::EmitCall(CallInfo &&info) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const CallInfo &
The function doesn't consume or take ownership

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be const CallInfo & because EmitHoistedArgs which is called inside EmitCall takes a CallInfo&. But EmitCall can be CallInfo&

@nunoplopes
Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

lucic71 and others added 10 commits June 4, 2026 17:02
Depends on Cpp2Rust#161. See Cpp2Rust#161 for a more detailed description of how goto is
implemented.

On the codegen part, this PR emits the goto_block and goto macros when
it hits functions that use labels and gotos internally. It also hoists
local variables at the start of the function so that all goto_block arms
can see and use the local variables.
Changed the access pattern of flexible arrays from

```rs
let bytes: [u8; 1] = ...;
bytes[10] = ...
```

to

```rs
let bytes: [u8; 1] = ...;
*bytes.as_mut_ptr().add(10) = ...;
```

The original access panics because the bounds are violated, i.e.
accessing the 10th element in an array declared with 1 element. The new
access does not panic, it preserves the C/C++ behavior.
C allows name collision between the following 2 declarations:

```c
struct X {};
typedef enum {} X;
```

The 2 declarations live in different name spaces:

6.2.3 Name spaces of identifiers
[...] there are separate namespaces for various categories of
identifiers, as follows:
1. the tags of structures, unions, and enumerations (disambiguated by
following any) of the keywords struct, union, or enum);
2. all other identifiers, called ordinary identifiers (declared in
ordinary declarators or as enumeration constants).

`struct X {}` lives in namespace 1 and `typedef enum {} X` lives in
namespace 2.

We cannot translate both to `X` in the Rust code. We need to
disambiguate between the 2 names. I chose to translate them as:

```
struct X {} -> X
typedef enum {} X -> X_enum
```

C++ does not have the name space rule for identifiers. I added the
`tag->getASTContext().getLangOpts().CPlusPlus` check in
DisambiguateAnonymousTag to avoid polluting the C++ generated files.

---------

Co-authored-by: Nuno Lopes <nuno.lopes@tecnico.ulisboa.pt>
`"\xff"` or `'\xff'` are valid strings/char literals in C/C++. They were
not escaped and produced an invalid character that broke the
compilation.

After I escaped them, I realized that they cannot be translated to
`\xff` in Rust `str`'s because str only accepts `\x00 - \x7f`. Byte
array accepts `> \x7f`.

I changed Ptr::from_string_literal to accept a byte array instead of a
str. I also changed GetFmtArg to refuse to create Rust `str` that
contains chars `> \x7f` and fallback to GetRawArg that produces escaped
byte arrays.
The generated `Ord::cmp` for method-based `operator<` used
`other.<op>(other)` in its second branch, which can never evaluate true
and therefore suppresses `Ordering::Greater`. This change aligns that
branch with the intended reverse comparison (`other.<op>(self)`),
matching existing model-specific behavior.

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
…st#175)

`const char var[] = {"string literal"}` is a valid initialization. Add
support for it in codegen.
`return std::format("anon_enum_{}", GetLineNumber(enum_decl));` results
in name clashes between enums declared in different files at the same
line, I replaced that with `return GetNamedDeclAsString(enum_decl);`
lucic71 and others added 8 commits June 4, 2026 17:06
`libc::strlen(dummy as *const i8) as u64.into()` (without paren) errors
because we can't write `as u64.into()`, we need to wrap the entire arg
in a paren before calling .into(): `(libc::strlen(dummy as *const i8) as
u64).into()`
A continue statement inside a `do while` breakes the iteration and
re-evaluates the condition.

Our previous translation, i.e. 

```rs
loop {
  ...
  if !cond {
    break
  }
}
```

never re-evaluated the condition after a continue, resulting in an
infinite loop.

To fix this, use while instead of loop. This ensures that the condition
is re-evaluated right after the continue:

```rs
let mut __do_while = true;
while __do_while || cond {
  __do_while = false;
  ...
}
```

__do_while makes sure that the first iteration of the loop is always
executed.
…ust#178)

`StrCat` calls were using single-character string literals in many
places (e.g. `"&gt;"`-style delimiters and punctuation), which is
inconsistent with the intended char-based usage for single codepoints.
This updates those call sites to pass character literals instead.

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
@lucic71
Copy link
Copy Markdown
Contributor Author

lucic71 commented Jun 4, 2026

@copilot resolve the merge conflicts in this pull request

Done

@nunoplopes nunoplopes merged commit 073103f into Cpp2Rust:master Jun 4, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants